Chris Pollett > Old Classses > CS267
( Print View )

Student Corner:
  [Submit Sec1]
  [Grades Sec1]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Description]
  [Course Outcomes]
  [Outcomes Matrix]
  [Course Schedule]
  [Grading]
  [Requirements/HW/Quizzes]
  [Class Protocols]
  [Exam Info]
  [Regrades]
  [University Policies]
  [Announcements]

HW Assignments:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Hw5]  [Quizzes]

Practice Exams:
  [Mid1]  [Mid2]   [Final]

                           












HW#4 --- last modified January 27 2019 04:59:24.

Solution set.

Due date: Nov 9

Files to be submitted:
  Hw4.zip

Purpose: Gain experience with TREC eval software, BM25, query processing, and index compression.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

CLO3 -- Be able to explain where BM25, BM25F and divergence from randomness statistics come from.

CLO4 -- Give an example of how a posting list might be compressed using difference lists and gamma codes or Rice codes.

CLO6 -- Be able to evaluate search results by hand and using TREC eval software.

Specification:

Do the following exercises:

  1. Download and compile the trec_eval program from NIST. I want you to create a test_qrels_file.txt by hand based for your corpus and topics of Hw2. Then I want you to modify the experiments you did for Homework 2 and run them on a modified version of your Hw2 code. Modify your code to do term at a time processing (where possible (can't do for proximity)) using accumulator pruning. Replace the cos option with a bm25 option that ranks according to bm25 score. Modify your code to output in the format of a trec_top_file. The experiment I was you to conduct is on the topics and corpus of Hw2, but I want you to compare bm25 versus proximity rather than cosine similarity versus proximity. Your zip file should contain all the text files you generated for your experiments and it should contain as part of Hw4.pdf a write-up which explains what you did and summarizes the results you got. It should also contain actual trec_eval output.
  2. Do Exercise 6.1 but use Pr["a"] = 0.7 and Pr["b"] = 0.3.
  3. Write pseudo-code for encoding and decoding `delta` codes.
  4. Do Exercise 6.7 but where `N/(N_T) = 163`.
  5. Do Exercise 6.9.
  6. Do Exercise 6.10.

Point Breakdown

test_qrels_file.txt files as described. Code modified to output in trec_top_file format. (1/2pt each) 1pt
Can find in code implementation of term at a time processing. 1pt
Accumulator pruning implemented. 1pt
trec_eval experiments and write-up in Hw4.pdf (0.5pt experiment data, 1 write-up clearly state experiments performed (so someone could reproduce them), 0.5 paragraph conclusion draw from experiments and trec-eval results) 2pts
Exercises 2-6 (1pt each). 5pts
Total10pts